Random Projection for High Dimensional Data Clustering: A Cluster Ensemble Approach
نویسندگان
چکیده
We investigate how random projection can best be used for clustering high dimensional data. Random projection has been shown to have promising theoretical properties. In practice, however, we find that it results in highly unstable clustering performance. Our solution is to use random projection in a cluster ensemble approach. Empirical results show that the proposed approach achieves better and more robust clustering performance compared to not only single runs of random projection/clustering but also clustering with PCA, a traditional data reduction method for high dimensional data. To gain insights into the performance improvement obtained by our ensemble method, we analyze and identify the influence of the quality and the diversity of the individual clustering solutions on the final ensemble performance.
منابع مشابه
High-Dimensional Unsupervised Active Learning Method
In this work, a hierarchical ensemble of projected clustering algorithm for high-dimensional data is proposed. The basic concept of the algorithm is based on the active learning method (ALM) which is a fuzzy learning scheme, inspired by some behavioral features of human brain functionality. High-dimensional unsupervised active learning method (HUALM) is a clustering algorithm which blurs the da...
متن کاملEnsemble Fuzzy Clustering using Cumulative Aggregation on Random Projections
Random projection is a popular method for dimensionality reduction due to its simplicity and efficiency. In the past few years, random projection and fuzzy c-means based cluster ensemble approaches have been developed for high dimensional data clustering. However, they require large amounts of space for storing a big affinity matrix, and incur large computation time while clustering in this aff...
متن کاملEnsemble Clustering of High Dimensional Data with FastMap Projection
In this paper, we propose an ensemble clustering method for high dimensional data which uses FastMap projection to generate subspace component data sets. In comparison with popular random sampling and random projection, FastMap projection preserves the clustering structure of the original data in the component data sets so that the performance of ensemble clustering is improved significantly. W...
متن کاملCluster Ensembles for High Dimensional Clustering: An Empirical Study
This paper studies cluster ensembles for high dimensional data clustering. We examine three different approaches to constructing cluster ensembles. To address high dimensionality, we focus on ensemble construction methods that build on two popular dimension reduction techniques, random projection and principal component analysis (PCA). We present evidence showing that ensembles generated by ran...
متن کاملProjective clustering of high dimensional data
Clustering of high-dimensional data can be problematic, because the usual notions of distance or similarity break down for data in high dimensions. More specifically, it can be shown that, as the number of dimensions increases, the distance to the nearest point approaches the distance to the farthest one. Two approaches are common for dealing with this problem. The idea behind the first approac...
متن کامل